IN-LINE INTERRUPT HANDLING AND LOCK-UP FREE TLBs
نویسندگان
چکیده
Title of Thesis: In-line Interrupt Handling and Lockup Free TLBs Degree Candidate: Aamer Jaleel Degree and Year: Master of Science, 2002 Thesis directed by: Dr. Bruce L. Jacob Department of Electrical and Computer Engineering The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reorder buffer. In particular, these instructions may have reached a very deep stage in the pipeline—representing significant work that is wasted. In addition, an overhead of several cycles and wastage of energy (per exception detected) can be expected in re-fetching and re-executing the instructions flushed. This thesis concentrates on improving the performance of precisely handling software managed translation lookaside buffer (TLB) interrupts, one of the most frequently occurring interrupts. The thesis presents a novel method of in-lining the interrupt handler within the reorder buffer. Since the first level interrupt-handlers of TLBs are usually small, they could potentially fit in the reorder buffer along with the user-level code already there. In doing so, the instructions that would otherwise be flushed from the pipe need not be re-fetched and re-executed. Additionally, it allows for instructions independent of the exceptional instruction to continue to execute in parallel with the handler code. By in-lining the TLB interrupt handler this provides lock-up free TLBs. This thesis proposes the prepend and append schemes of inlining the interrupt handler into the available reorder buffer space. The two schemes are implemented on a processor with a 4-way out-of-order core similar to the Alpha 21264. We compare the overhead and performance impact of handling TLB interrupts by the traditional scheme, the append in-lined scheme, and the prepend in-lined scheme. For small, medium, and large memory footprints, the overhead is quantified by comparing the number and pipeline state of instructions flushed, the energy savings, and the performance improvements. We find that, lock-up free TLBs reduce the overhead of re-fetching and re-executing the instructions by 30-95%, reduce the energy consumption and execution time by 5-25%, and also reduce the energy wasted by 30-90%. IN-LINE INTERRUPT HANDLING AND
منابع مشابه
On the Design and Implementation of an Efficient Lock-Free Scheduler
Schedulers for symmetric multiprocessing (SMP) machines use sophisticated algorithms to schedule processes onto the available processor cores. Hardware-dependent code and the use of locks to protect shared data structures from simultaneous access lead to poor portability, the difficulty to prove correctness, and a myriad of problems associated with locking such as limiting the available paralle...
متن کاملImproving the Precise Interrupt Mechanism of Software-Managed TLB Miss Handlers
The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reord...
متن کاملPerformance results from SALMON, a cluster of Workstations Connected by SCI
SCI (Scalable Coherent Interface, IEEE Standard no.1596) defines a standard for high speed interconnection. We present simple throughput and latency results using different implementation strategies and configurations of SparcStations interconnected using Sbus/SCI adapters. These results are compared with similar results obtained from running TCP/IP over Ethernet and ATM. The Sbus/SCI interface...
متن کاملPerformance and the Single/multi{processor Operating System Process Subsystem
Operating systems depend on process subsystem performance, which we analyze in Choices. We use compile time specialization to optimize both single and multiprocessor performance within a single design. We partition critical sections into two types, and compose independent control mechanisms to produce high performance, specialized locks. We select heavily used process operations and justify new...
متن کاملImplementation and Performance Evaluation of M-VIA on AceNIC Gigabit Ethernet Card
This paper describes the implementation and performance of M-VIA on the AceNIC Gigabit Ethernet card. The AceNIC adapter has several notable hardware features for high-speed communication, such as jumbo frames and interrupt coalescing. The M-VIA performance characteristics were measured and evaluated based on these hardware features. Our results show that latency and bandwidth improvement can b...
متن کامل